How to select a good voice for TTS

نویسنده

Sunhee Kim

چکیده

Even though the perceived quality of a speaker’s natural voice does not necessarily guarantee the quality of synthesized speech, it is required to select a certain number of candidates based on their natural voice before moving to the evaluation stage of synthesized sentences. This paper describes a male speaker selection procedure for unit selection synthesis systems in English and Japanese based on perceptive evaluation and acoustic measurements of the speakers’ natural voice. A perceptive evaluation is performed on eight professional voice talents of each language. A total of twenty native-speaker listeners are recruited in both languages and each listener is asked to rate on eight analytical factors by using a five-scale score and rank three best speakers. Acoustic measurement focuses on the voice quality by extracting two measures from Long Term Average Spectrum (LTAS), the socalled Speakers Formant (SPF), which corresponds to the peak intensity between 3 kHz and 4 kHz, and the Alpha ratio (AR), which is the lower level difference between 0 and 1 kHz and 1 and 4 kHz ranges. The perceptive evaluation results show a very strong correlation between the total score and the preference in both languages, 0.9183 in English and 0.8589 in Japanese. The correlations between the perceptive evaluation and acoustic measurements are moderate with respect to SPF and AR, 0.473 and -0.494 in English, and 0.288 and -0.263 in Japanese.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dialog speech acts and prosody: Considerations for TTS

As natural language dialog systems involving both speech recognition and text-to-speech (TTS) synthesis become more sophisticated, the limitations of general-purpose TTS for human-computer dialogs have become more apparent. Much subtlety and complexity of meaning in natural language dialogs is conveyed by prosody; how something is said is often as important as what words are spoken. At the same...

متن کامل

MARY TTS unit selection and HMM-based voices

This paper describes the implementation of a unit selection English voice and a HMM-based Hindi voice for our participation in the Blizzard Challenge 2013. The two voices have been created using the MARY TTS voice building framework. We describe how audiobook data is used to create the English voice and how a quality controlmeasure (statisticalmodel cost) is used to control the selection of uni...

متن کامل

Unit selection based on voice recognition

In this paper, we describe a perceptual voice recognition method to improve the naturalness of synthesized speech for Mandarin Chinese text-to-speech (TTS) baseline system. As a large TTS speech corpus, speech data always has different acoustic properties for different data recording conditions. Speech data recorded under different conditions can finally influence the naturalness of synthesized...

متن کامل

Multilingual MARY TTS participation in the Blizzard Challenge 2009

The paper describes the Blizzard Challenge 2009 participation of MARY TTS, an open-source TTS system using a unit selection voice. We briefly outline the new language support framework we provide so that people can add support for their languages to MARY TTS, and describe how that framework was used for building a Mandarin Chinese system and voice. The system performs well for English and reaso...

متن کامل

How (not) to select your voice corpus: random selection vs. phonologically balanced

This paper compares the effect of two different voice corpus selection methods on the overall quality of unit selection-based text-to-speech (TTS) voices resulting from training on these corpora. The first selectionmethod aims to maximize the coverage of stressed as well as unstressed diphones (phonologically balanced: Phonbal) while the second method simply selects sentences at random (Random)...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

How to select a good voice for TTS

نویسنده

چکیده

منابع مشابه

Dialog speech acts and prosody: Considerations for TTS

MARY TTS unit selection and HMM-based voices

Unit selection based on voice recognition

Multilingual MARY TTS participation in the Blizzard Challenge 2009

How (not) to select your voice corpus: random selection vs. phonologically balanced

عنوان ژورنال:

اشتراک گذاری